Speeding Document Annotation with Topic Models

نویسندگان

  • Forough Poursabzi-Sangdeh
  • Jordan L. Boyd-Graber
چکیده

Document classification and topic models are useful tools for managing and understanding large corpora. Topic models are used to uncover underlying semantic and structure of document collections. Categorizing large collection of documents requires hand-labeled training data, which is time consuming and needs human expertise. We believe engaging user in the process of document labeling helps reduce annotation time and address user needs. We present an interactive tool for document labeling. We use topic models to help users in this procedure. Our preliminary results show that users can more effectively and efficiently apply labels to documents using topic model information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Global and Local Models for Multi-Document Summarization

In this paper we study the effectiveness of combining corpus-level (global) tag-topic models and target document set level local models for multi-document summarization. Recently tag-topic models that exploit both word level annotation (e.g. named entity type) and/or document level metadata (e.g. words related to topic categories) have been proposed to model documents tagged from two different ...

متن کامل

Optimizing Semantic Coherence in Topic Models

Large organizations often face the critical challenge of sharing information and maintaining connections between disparate subunits. Tools for automated analysis of document collections, such as topic models, can provide an important means for communication. The value of topic modeling is in its ability to discover interpretable, coherent themes from unstructured document sets, yet it is not un...

متن کامل

ALTO: Active Learning with Topic Overviews for Speeding Label Induction and Document Labeling

Effective text classification requires experts to annotate data with labels; these training data are time-consuming and expensive to obtain. If you know what labels you want, active learning can reduce the number of labeled documents needed. However, establishing the label set remains difficult. Annotators often lack the global knowledge needed to induce a label set. We introduce ALTO: Active L...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015